These are a general class of prediction algorithms which are motivated by the "pro-hotspot" algorithm, but which work in continuous space and time and so remove the annoying problem of choosing how to compute distances in a grid etc. In principle, this algorithm, with well-chosen kernels, should out perform all the "classical" algorithms. Unfortunently, the extensive use of KDEs means that this algorithm can be a little slow.
(1) is motivation, and (2) contains an example of this class of algorithm (see page 11, equation (2) of that paper, for example).
A grid is only used right at the end to produce the final prediction.
We estimate a relative "risk" at the current time, and at location $x$ (a two dimensional vector) by $$ r(x) = \sum_i f(t_i) g(x-x_i) $$ where we have events which occurred $t_i$ time units in the past, and at location $x_i$. This is hence a combined time and space KDE method.
We then compute the average of $r$ in each grid cell to obtain the prediction (in practise we use a monte-carlo approach, and sample $r$ at random locations in each grid cell).
Our code allows complete freedom in choosing $f$ and $g$; we provide some common choices. For the time component $f$:
For space we use a classical KDE method:
GaussianBase
(which with default parameters replicates the behaviour of the scipy
KDE method.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import open_cp
import open_cp.kde as kde
In [2]:
# Generate some random data
import datetime
times = [datetime.datetime(2017,3,10) + datetime.timedelta(days=np.random.randint(0,10)) for _ in range(20)]
times.sort()
xc = np.random.random(size=20) * 500
yc = np.random.random(size=20) * 500
points = open_cp.TimedPoints.from_coords(times, xc, yc)
In [3]:
region = open_cp.RectangularRegion(0,500, 0,500)
predictor = kde.KDE(region=region, grid_size=50)
predictor.time_kernel = kde.ExponentialTimeKernel(1)
predictor.space_kernel = kde.GaussianBaseProvider()
predictor.data = points
gridpred = predictor.predict(samples=20)
In [4]:
fig, ax = plt.subplots(figsize=(10,10))
m = ax.pcolor(*gridpred.mesh_data(), gridpred.intensity_matrix)
ax.scatter(points.xcoords, points.ycoords, marker="+", color="black")
cb = plt.colorbar(m, ax=ax)
cb.set_label("Relative risk")
None
In [5]:
points.timestamps
Out[5]:
In [6]:
points.xcoords, points.ycoords
Out[6]:
You can see the dependence on time: the most recent event gives much more total risk than an event far in the past.
In [ ]: